{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LLMs on RyzenAI with TurnkeyML\n",
    "\n",
    "This notebook will demonstrate how to bring up an example application that uses a RyzenAI to perform LLM inference. We will use the `turnkeyml.llm` APIs in order to make this as quick as possible. This notebook makes use of both the `RyzenAI NPU`, as well as the `RyzenAI Radeon integrated GPU (iGPU)`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Application\n",
    "\n",
    "Our example application will prompt the user for input and then return the LLM's reponse. This is the same technology stack used to create AMD GAIA, which shows how Retrieval Augmented Generation (RAG), agentic workflows, and other advanced techniques can be layered on top of RyzenAI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define our application: prompt the user and print the LLM's response\n",
    "# We define this in a way that makes the NPU and iGPU interchangable\n",
    "\n",
    "def application(model, tokenizer):\n",
    "    while True:\n",
    "        # Prompt the user\n",
    "        user_prompt = input(\"What is your prompt to the LLM? \")\n",
    "        print(\"Prompt:\",user_prompt)\n",
    "\n",
    "        # Exit the application if the user prompts \"exit\"\n",
    "        if user_prompt == \"exit\":\n",
    "            print(\"Done!\")\n",
    "            return\n",
    "\n",
    "        # Tokenize the user's prompt\n",
    "        input_ids = tokenizer(user_prompt, return_tensors=\"pt\").input_ids\n",
    "\n",
    "        # Generate the response\n",
    "        # Limit the response length to 30 tokens so that we have time to\n",
    "        # try a few prompts\n",
    "        llm_response = model.generate(input_ids, max_new_tokens=30)\n",
    "\n",
    "        # Decode the response into text\n",
    "        decoded_response = tokenizer.decode(llm_response[0])\n",
    "\n",
    "        # Print the response, then prompt for another input\n",
    "        print(\"Response:\",decoded_response)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## RyzenAI NPU Model Initialization\n",
    "\n",
    "### Prequisites for NPU\n",
    "\n",
    "- `ryzenai-transformers` conda environment is installed and activated.\n",
    "-  Access to `meta-llama/Llama-2-7b-chat-hf` on Hugging Face.\n",
    "- Install the TurnkeyML-LLM  in your `ryzenai-transformers` environment, see https://github.com/onnx/turnkeyml/tree/main/src/turnkeyml/llm/README.md#install-ryzenai-npu\n",
    "- Also `pip install jupyter` in your `ryzenai-transformers` environment.\n",
    "\n",
    "### Starting Up\n",
    "\n",
    "- Run `conda activate ryzenai-transformers`\n",
    "- Run `setup_phx.bat` or `setup_stx.bat` on your PHX (RyzenAI 7000 or RyzenAI 300, respectively)\n",
    "- Run `jupyter notebook`\n",
    "- Copy the URL printed from the previous command, and use that as the kernel for this notebook.\n",
    "  - Example: \n",
    "    > Or copy and paste one of these URLs:\n",
    "    >\n",
    "    >      http://localhost:8888/tree?token=14796c43ce39ef9a3296b7c7c26335e01f7bdc8b0fd4efce\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the turnkey APIs\n",
    "from turnkeyml.llm import leap\n",
    "\n",
    "# Load the model on to RyzenAI NPU\n",
    "# NOTE: this takes a couple of minutes, but after you've done it once\n",
    "#   you can keep reusing the `model` instance in subsequent notebook cells\n",
    "npu_model, npu_tokenizer = leap.from_pretrained(\n",
    "    \"meta-llama/Llama-2-7b-chat-hf\", recipe=\"ryzenai-npu\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NPU Application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the application on NPU\n",
    "# User should prompt \"exit\" to stop the application\n",
    "application(npu_model, npu_tokenizer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Radeon iGPU Initialization\n",
    "\n",
    "### Prequisites for iGPU\n",
    "\n",
    "- `turnkeyml[llm-oga-dml]` is installed into an activated conda environment.\n",
    "-  Download a copy of `Phi-3-mini`\n",
    "- See https://github.com/onnx/turnkeyml/tree/main/src/turnkeyml/llm/README.md#install-onnxruntime-genai for details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the turnkey APIs\n",
    "from turnkeyml.llm import leap\n",
    "\n",
    "# Load the model on iGPU\n",
    "igpu_model, igpu_tokenizer = leap.from_pretrained(\n",
    "    \"microsoft/Phi-3-mini-4k-instruct\", recipe=\"oga-dml-igpu\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Radeon iGPU Application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the application on iGPU\n",
    "# User should prompt \"exit\" to stop the application\n",
    "application(igpu_model, igpu_tokenizer)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
